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Abstract. Let (Xt) teT be a family of real-valued centered random variables 
indexed by a countable set T. In the first part of this paper, we establish expo- 
nential bounds for the deviation probabilities of the suprcmum Z = sup tgT Xt 
by using the generic chaining device introduced in Talagrand (1995). Com- 
pared to concentration-type inequalities, these bounds offer the advantage to 
hold under weaker conditions on the family (Xt) t€T . The second part of 
the paper is oriented towards statistics. We consider the regression setting 
Y = f + £ where / is an unknown vector of K n and £ is a random vector 
the components of which arc independent, centered and admit finite Laplace 
transforms in a neighborhood of 0. Our aim is to estimate / from the observa- 
tion of Y by mean of a model selection approach among a collection of linear 
subspaces of M™. The selection procedure we propose is based on the mini- 
mization of a penalized criterion the penalty of which is calibrated by using the 
deviation bounds established in the first part of this paper. More precisely, 
we study suprema of random variables of the form Xt = X^iLi when t 
varies among the unit ball of a linear subspace of IR n . We finally show that 
our estimator satisfies some oracle-type inequality under suitable assumptions 
on the metric structures of the linear spaces of the collection. 



1. INTRODUCTION 

1.1. What is this paper about? The present paper contains two parts. The first 
one is oriented towards probability. We consider a family (X t ) teT of real- valued 
centered random variables indexed by a countable set T and give an exponential 
bound for the probability of deviation of the supremum Z — sup teT X t . The result 
is established under the assumption that the Laplace transforms of the increments 
X t — X s for s,t£T satisfy some Bernstein-type bounds. This assumption is con- 
venient to handle simultaneously the cases of subgaussian increments (which is the 
typical case in the literature) as well as more "heavy tailed" ones for which the 
Laplace transform of (X s — X t ) 2 may be infinite in a neighborhood of 0. Under ad- 
ditional assumptions on the X t , our result allows to recover (with worse constants) 
some deviation bounds based on concentration-type inequalities of Z around its ex- 
pectation. However our general result cannot be deduced from those inequalities. 
As we shall see, concentration-type inequalities could be false under the kind of 
assumptions we consider on the family (X t ) teT . 
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The second part is oriented towards statistics. We consider the regression frame- 
work 

(1) Y i = f i + S i ,i = l,...,n 

where / = (/i, ■ • ■ , / n ) is an unknown vector of R n and £ = (£1, . . . , £„) is a random 
vector the components of which are independent, centered and admit suitable expo- 
nential moments. Our aim is to estimate / from the observation of Y = (Y\ ,Y n ) 
by mean of a model selection approach. More precisely, we start with a collection 
S = {S m , m € A4} of finite dimensional linear spaces S m to each of which we asso- 
ciate the least-squares estimator f m e S m of /. From the same data Y, our aim is 
to select some suitable estimator / = / A among the collection T = |/ TO , m e .M j 

in such a way that the (squared) Euclidean risk of / is as close as possible to the 
infimum of the risks over T . The selection procedure we propose is based on the 
minimization of a penalized criterion the penalty of which is calibrated by using 
the deviation bounds established in the first part of this paper. More precisely, the 
penalty is obtained by studying the deviations of x 2 ~type random variables, that 
is, random variables of the form lITs^ 2 . where | | 2 denotes the Euclidean norm and 
lis the orthogonal projector onto a linear subspace S of R n . To our knowledge, 
these deviation bounds in probability are new. We finally show that / satisfies 
some oracle-type inequality under suitable assumptions on the metric structures of 
the S m . 

In the following sections, we situate the results of the present paper within the 
literature. 

1.2. Controlling suprema of random processes. Among the most common 
deviation inequalities, let us recall 

Theorem 1 (Bernstein's inequality). Let X\, . . . , X n be independent random vari- 
ables and set X = Y^i=i (-^« — ^(-^i))- Assume that there exist nonnegative num- 
bers v, c such that for all k > 3 



n 

(2) E E [i 



X,\ k 



then for all u > 

(3) P (x > + cuj < e~ u . 



Besides, for all x > 0, 

(4) P (X > x) < exp 



x 2 



2(v 2 



In the literature, (2) together with the fact that the Xi are independent is some- 
time replaced by the weaker condition 

' \ 2 v 2 - 
2(1 -Ac) 

with the convention 1/0 = +oo. Bernstein's inequality allows to derive deviation 
inequalities for a large class of distributions among which the Poisson, Laplace, 
Gamma or the Gaussian distributions (once suitably centered). In this latter 



(5) E (e xx ) < exp 



VA e (0,1/c) 



BERNSTEIN- TYPE INEQUALITY 



3 



case, (5) holds with c = 0. Another situation of interest is the case where the 
Xi are i.i.d. with values in [— c, c]. Then (2) and (5) hold with v 2 = va,r(Xi). 

In the recent years, many efforts have been done to extend these bounds to the 
deviations of suprema Z of random variables X t . When T is a (countable) bounded 
subset of a metric space (X,d), a common technique is to use a chaining device. 
This approach seems to go back to Kolmogorov and was very popular in statistics 
in the 90s to control suprema of empirical processes with regard to the entropy of 
T, see van de Geer (1990) for example. However, this approach leads to pessimistic 
numerical constants that are in general too large to be used in statistical procedures. 
An alternative to chaining is the use of concentration inequalities. For example, 
when the X t are Gaussian, for all u > we have 

(6) P (z > E (Z) + V2v 2 u) < e~ u where v 2 = sup var(X t ). 

^ ' t£T 

This inequality is due to Sudakov & Cirel'son (1974). Compared to chaining, (6) 
provides a powerful tool for controlling suprema of Gaussian processes as soon as 
one is able to evaluate E(Z) sharply enough. 

It is the merit of Talagrand (1995) to extend this approach for the purpose of 
controlling suprema of bounded empirical processes, that is, for X t of the form 
Xt = Y^i=i — ^ (*(£*)) wri ere £i, . . . ,£„ are independent random variables and 
T a set of uniformly bounded functions, say with values in [— c, c]. From Talagrand's 
inequality, one can deduce deviation bounds with respect to E(Z) of the form 

(7) P Z > C (E(Z) + y/v^u + cuj < cxp (-u) for all u > 

where v 2 = sup teT var (X t ) and C is a positive numerical constant. Apart from the 
constants, (7) and (3) have a similar flavor even though the boundness assumption 
on the elements of T seems too strong compared to conditions (2) or (5). 

As the original result by Talagrand involved suboptimal numerical constants, 
many efforts were made to recover it with sharper ones. A first step in this di- 
rection is due to Lcdoux (1996) by mean of nice entropy and tensorisation argu- 
ments. Then, further refinements were made on Ledoux's result by Massart (2000), 
Rio (2002) and Bousquet (2002), the latter author achieving the best possible re- 
sult in terms of constants. For a nice introduction to these inequalities (and their 
applications to statistics) we refer the reader to the book by Massart (2007) . Other 
improvements upon (7) have been done in the recent years. In particular Klein & 
Rio (2005) generalized the result to the case 

n 

(8) X t = Y,Xi,t 

i=l 

where for each tGT, (Xi it ) are independent (but not necessarily i.i.d.) 

centered random with values in [— c, c]. 

In the present paper, the result we establish holds under different assumptions 
than the ones leading to inequalities such as (7). First, as pointed out by Jonas 
Kahn, an inequality such as (7) could be false under the kind of assumptions we 
consider on the family (X t ) teT . In the counter-example we give in Section 2 (it 
is a slight modification of the one Jonas Kahn gave to us), we see that Z may 
deviate from E(Z) on a set the probability of which may not be exponentially 
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small. Moreover, even in the more common situation where X t is of the form (8), 
we establish deviation inequalities that are available for possibly unbounded random 
variables X^ t which is beyond the scope of the concentration inequalities proven in 
Bousquet (2002) and Klein & Rio (2005). 

Even though it was originally introduced to bound E(Z) from above, generic 
chaining as described in Talagrand's book (2005) provides another way of establish- 
ing deviation bounds for Z. Talagrand's approach relies on the idea of decomposing 
T into partitions rather than into nets as it was usually done before with the classical 
chaining device. Denoting by ei, . . . , eu the canonical basis of M fc and , ■ ■ ■ ,5,^ 
i.i.d. random vectors of R" with common distribution /i, generic chaining was used 
in Mendclson et al (2007) and Mendelson (2008) to study the properties of the ran- 
dom operator r : 1 1— » k~ 1 / 2 5^ i=1 {£,^\t)ei defined for t in the unit sphere T of M™ 
(which we endow with its usual scalar product (.,.)). Their results rely on the con- 
trol of suprema of random variables of the form X t = fc _1 J2i=i (£^>*) f° r t 
When fc = 1, this form of X t is analogous to that we consider in our statistical ap- 
plication. However, the deviation bounds obtained in Mendclson et al (2007) and 
Mendelson (2008) require that // be subgaussian which we do not want to assume 
here. Closer to our result is Theorem 3.3 in Klartag & Mendclson (2005) which 
bounds on a set of probability at least 1 — <5 (for some S £ (0, 1)) the supremum 
Z = sup tgT \X t \. Unfortunately, their bound involves non-explicit constants (that 
depend on 5) which makes it useless for statistical issues. 

Our approach also uses generic chaining. With such a technique, the inequalities 
we get suffer from the usual drawback that the numerical constants are non-optimal 
but at least allow a suitable control of the x 2- type random variables we consider in 
the statistical part of this paper. To our knowledge, these inequalities are new. 

1.3. From the control of x 2 -type random variables to model selection in 
regression. The reason why X type random variables naturally emerge in the 
regression setting is the following one. Let S be a linear subspace of R n . The 
classical least-squares estimator of / in S is given by / = HsY = Hsf + n,s£ and 
since the Euclidean (squared) distance beween / and / decomposes as 

„ 2 
/-/ 



2 = \f-U s f\ 2 2 + \U s ^\l 



the study of the quadratic loss 



requires that of its random component 

2 

|IIs£| 2 . This quantity is called a x 2 -type random variable by analogy to the Gauss- 
ian case. Its study is connected to that of suprema of random variables by the 
formula 

n 

(9) |n s £| 2 = su P X t = Z with x t = yv^ 

where T is the unit ball of S (or a countable and dense subset of it). The control 
of such random variables is at the heart of the model selection scheme. When £ is 
a standard Gaussian vector of W 1 , Birge & Massart (2001) used (6) to control the 
probability of deviation of |ns£| 2 with respect to its expectation. The strong inte- 
grability properties of the £j allows to handle very general collections of models. By 
using chaining techniques, these results were extended to the subgaussian case (that 
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is for ±£j satisfying (5) with c = for all i) in Baraud, Comte & Vicnnet (2001). 
Similarly, very few assumptions were required on the collection to perform model 
selection. Baraud (2000) considered the case where the & only admit few finite 
moments. There, the weak integrability properties of the & induced severe restric- 
tions on the collection of models S. Typically, for all D € {1, . . . , n} the number of 
models S m of a given dimension D had to be at most polynomial with respect to 
D, the degree of the polynomial depending on the number of finite moments of £i. 

To our knowledge, the intermediate case where the random variables ±& admit 
exponential moments of the form (5) for all i (with c ^ to exclude the already 
known subgaussian case) has remained open for general collections of models. In 
this context, the concentration-type inequality obtained in Klein & Rio (2005) can- 
not be used to control |ns£| 2 as it would require that the £j be bounded. An 
attempt at relaxing this boundedness assumption on the & can be found in Bous- 
quet (2003). There, the author considered the situation where T is a subset of 
[— 1, 1]" and the & independent and centered random variables satisfying 



(10) 



E 



16: 



Note that (10) implies (5) with v 2 — v 2 (t) — \t\ 2 a 2 . The result by Bousquet pro- 
vides an analogue of (7) with v 2 replaced by na 2 although one would expect the 
smaller (and usual) quantity v 2 — sup tgT v 2 (t). Because of this, the resulting in- 
equality turns out to be useless at least for the statistical application we have in 
mind. This fact has already been pointed out by Marie Sauve in Sauve (2008). 
Sauve also tackled the problem of model selection when the & satisfy (10). Com- 
pared to Baraud (2000), her condition on the collection of models is weaker in the 
sense that the number of models with a given dimension D is allowed to be ex- 
ponentially large with respect to D. However, the collection she considered only 
consists of linear spaces S m with a specific form (leading to regressogram estima- 
tors). Besides, her selection procedure was relying on a known upper bound on 
maxj = i 7i \fi\ which can be unrealistic in practice. Unlike Marie Sauve's, our pro- 
cedure does not depend on such an upper bound and allows for more general linear 
spaces S m . 



1.4. Organisation of the paper and main notations. The paper is organized 
as follows. We present our deviation bound for Z in Section 2. The statistical 
application is developed in Sections 3 and 4. In Section 3 we consider particular 
cases of collections S of interest, the general case being considered in Section 4. 
Section 5 is devoted to the proofs. 

Along the paper we assume that n > 2 and use the following notations. We denote 
by e\, . . . , e„ the canonical basis of R" which we endow with the Euclidean inner 
product denoted (.,.). For x e K™, we set \x\ 2 = \J (x, x), \x\\ — X)"=i \ x %\ an d 
|x|oo = maxj = i viii „ \xi\. The linear span of a family u±, . . . , Uk of vectors is denoted 
by Spanjui, . . . , Uk}- The quantity |/| is the cardinality of a finite set /. Finally, k 
denotes the numerical constant 18. It appears first in the control of the deviation 
of Z when applying Talagrand's chaining argument and then all along the paper. It 
seemed interesting to stress up the influence of this constant in the model selection 
procedure we propose. 
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2. A Talagrand-type Chaining argument for controlling suprema of 

RANDOM VARIABLES 

Let (X t ) teT be a family of real valued and centered random variables indexed by 
a countable and nonempty set T. Fix some to in T and set 

Z = sup(X t - X to ) and Z = sup \X t - X to \ . 
teT teT 

Our aim is to give a probabilistic control of the deviations of Z (and Z) . We make 
the following assumptions 

Assumption 1. There exist two distances d and 5 onT and a nonnegative constant 
c such that for all s,t G T (s ^ t) 



(11) 



E 



,\(X t -X e ) 



< exp 



\ 2 d 2 (s,t) 
2(1 - \c6(s,t)) 



VA G 



0, 



1 

c5(s,t) 



with the convention 1/0 = +oo. 



Note that c = corresponds to the particular situation where the increments of 
the process X t are subgaussian. 

Besides Assumption 1, we also assume in this section that d and 8 derive from 
norms. This is the only case we need to consider to handle the statistical problem 
described in Section 3. Nevertheless, a more general result with arbitrary distances 
can be found in Section 5. 

Assumption 2. Let S be a linear space with finite dimension D endowed with 
two arbitrary norms denoted | || 2 and \\ respectively. Define for s,t G S, 
d(s,t) = \\t — s|| 2 and 5(s,t) = \\s — tW^ and assume that for constants v > and 
c> 0, 

T C {t e S | \\t - t \\ 2 < v, c\\t- t Hoc < b} . 
Then, the following result holds. 
Theorem 2. Under Assumptions 1 and 2, 



(12) 
with k - 
(13) 



P Z > k (^v 2 (D + x) + b(D + x)j <e- x , Vx > 
18. Moreover 



Z> K 



yftfl{P + x)+b{D + xf) 



< 2e" 



Vx > 0. 



Since S is separable, the result easily extends to the case where T C S is not 
countable provided the paths t^X t are continuous with probability 1 (with respect 
to II II 2 or || Hoc, both norms being equivalent on S). 

2.1. Connections with deviations inequalities with respect to E(Z). In 
this section we make some connections between our bound (12) and inequalities (6) 
and (7). Along this section, T is the unit ball of the linear span S of an orthonormal 
system {u\, . . . ,Ud}- Both norms | | 2 and | being equivalent on S, we set 

A 2 (S) = sup §p < +oo. 

t£T\{0} |*|2 

Note that A 2 (5) depends on the metric structure of S. In all cases, A 2 (S I ) < 1, 
this bound being achieved for S = Span {d, . . . , e^} for example. However, A 2 (5) 
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can be much smaller, equal to y/ D jn for example, when n = kD for some positive 
integer k and Uj = {e(j-\)k+i, ■ • • , e jk) /Vk for j = 1,...,D. The set T fulfills 



*lc 



1 and b 



Assumption 2 with to = 0, d(s,t) = \s — t\ 2 , 8(s,t) 
cA 2 (S). Let £ = (£i, . . . ,£„) be a random vector of W l with i.i.d. components of 
common variance 1. We consider the process defined on T by X t — (t,£) and note 
that in this case Z = sup tgT A t = |IIs£| 2 . Besides, by using Jensen's inequality 



(14) 



E [Z] = E 



D 



\ 3=1 



< Vd. 



The Gaussian case: Assume that the £j are standard Gaussian random variables. 
On the one hand, since sup teT v&r(X t ) = 1 we deduce from Sudakov & Cirel'son's 
bound (6) together with (14) 

(15) 



[z > VD+ V2x"j < e- x 1 Vx > 0. 



On the other hand, since (5) holds with c = 0, for all s,t £ S and A > 



E 



,\(X t -X.) 



n 

l[E[ t 



* - I <]{exp 

i=l 



A 2 \U 



< exp 



A 2 It - 



Consequently, (11) holds with c = and one can apply Theorem 2 to get 



(16) 



Z > k (yf) + y/x^ < P [z > kVD + < e~ x , Va; > 



0. 



Apart from the numerical constants, it turns out that (15) and (16) are similar in 
this case. 

The bounded case: Let us assume that the £j take their values in [—a, a] for some 
a > 1. We can apply the bound given by Klein & Rio (2005) with v = 1 and 
c = aA 2 (S) in (7) which together with (14) gives for a suitable constant C > 0, 

(17) P Z> C (VD + ^ + aA 2 (S)x^ < exp (-x) for all x > 0. 

When the are bounded, there are actually two ways of applying Theorem 2. 
One relics on the fact that the random variables ±£j satisfy (5) with v = 1 and 
c = a for all i. Hence, whatever s,t £ S and A < (a \s — tl^)^ 1 , 



E 



,X(X t -X e ) 



n n 

exp 



i=l 



A 2 \U - Sj|" 



2(l-Ao|t-*U 



< exp 



X 2 \t- 



2(1 -\a\t-sU 
and since Assumption 1 holds with c = a and we get from Theorem 2 



(18) 



Z> k I v -D + \fx + aA 2 (S)x + aA 2 (S)D 



< e~ x , Vx > 0. 



Inequalities (17) and (18) essentially differ by the fact that the latter involves 
the extra term aA 2 (S)D. Hence, we recover (17) only for those S bearing some 
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specific metric structure for which A2(S) < C'(avD) 1 for some numerical constant 
C > 0. 

The other way of using Theorem 2 is to note that the random variables ±£j are 
subgaussian (because they are bounded) and therefore satisfy (5) with v = a and 
c = 0. By arguing as in the Gaussian case, Assumption 1 holds with d(s, t) = 
a \s — t\ 2 for all s, t e S, c = and Assumption 2 is fulfilled with v = a and 6 = 0. 
We deduce from Theorem 2 



(19) 



Z > 



< e" 



Vx > 0. 



Note that whenever a is not too large compared to 1, this bound improves (17) by 
avoiding the linear term aA 2 (S)x. 

2.2. A counter-example. In this section we show that for the supremum Z of 
a random process X = (X t ) teT satisfying (11) may not concentrate around E(Z). 
More precisely let us show that (7) could be false under (11). A simple counter- 
example is the following one. For D > 1, let S = Span {ei, . . . , en}, T be the unit 
ball of S and X' = {X' t ) teT the Gaussian process defined for t € T by t 
where £ is a standard Gaussian vector of R™. For p e (0, 1), define X as either X' 
with probability p or the process X" identically equal to with probability 1 — p. 
On the one hand, note that both processes X' and X" satisfy (11) with c = 0, 
d(s, t) = \s — t\ 2 for all s,t £ S and therefore so does X (whatever p). On the other 
hand, since 



E(Z) = P E 



supX' t 



pE 



\ 



»=i 



< P VD 



and sup tgT var(A t ) < 1, (7) would imply that for some positive numerical constant 
C (that we can take larger than 1 with no loss of generality) whatever p e (0, 1) 
and u > 0, 



Z > CpVD + C(y/u + u) 



pF 



- e 



\ i=i 



In particular, by taking p = (2C) 1 G (0, 1) and u = log(2/p), we would get 

1 C 



> 



( v /log(2/p) + log(2/p) 



< 



which is of course false by the law of large numbers for large values of D. 



3. Applications to model selection in regression 

Consider the regression framework given by (1) and assume that for some known 
nonnegative numbers a and c 



(20) 



A 2 <7 



logE [e A «-] < ^ — for all A E (-1/c, 1/c) and i = 1, . . . , n. 
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Inequality (20) holds for a large class of distributions (once suitably centered) in- 
cluding Gaussian, Poisson, Laplace or Gamma (among others). Besides, (20) is 
fulfilled when the £j satisfy (10) and therefore whenever these are bounded. 

Our estimation strategy is based on model selection. We start with a (possibly 
large) collection {S m , m G A4} of linear subspaces (models) of K™ and associate to 
each of these the least-squares estimators f m = Us m Y. Given a penalty function 
pen from M. to R+, we define the penalized criterion crit(.) on M. by 



(21) 



crit(m) 



Y - fn 



+ pen(m). 



In this section, we propose to establish risk bounds for the estimator of / given 
by fm where the index rh is selected from the data among M. as any minimizer of 
crit(.). 

In the sequel, the penalty pen will be based on some a priori choice of nonnegative 
numbers {A m , m e M} for which we set 



< +oo. 



When £ = 1, the choice of the A m can be viewed as that of a prior distribution 
on the models S m . For related conditions and their interpretation, sec Barron and 
Cover (1991) or Barron et al (1999). 

In the following sections, we present some applications of our main result (to be 
presented in Subsection 4.2) for some collections of linear spaces {S m , m € M} of 
interest. 



3.1. Selecting among histogram- type estimators. For a partition m of {1, . . . , n}, 

S m denotes the linear span of vectors of R™ the coordinates of which are constants 
on each element / of m. In the sequel, we shall restrict to partitions m the elements 
of which consist of consecutive integers. 

Consider a partition mof{l, ...,«} and M a collection of partitions m such that 
S m C S m . We obtain the following result. 

Proposition 1. Let a, b > 0. Assume that 



(22) 

If for some K > 1, 



171 > a 2 log n, VI em. 



(23) pen(m) > Kn 2 ( a 2 + 2c ( g + c )( 6 + 2 ) ) (| m | + Afn ) , Vm e M. 



an 



the estimator f m satisfies 



(24) 



E 



/ - fn 



<C(K) 



inf 



E 



/ - fn 



+ pen(m) 



+ R 



where C(K) is given by (30) and 



R^ K 2 [a 2 + 2^ + ^ b+2 ^^W C+ ^ +2)2 . 



ok 
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Note that when c = 0, inequality (23) holds as soon as 

(25) pen(m) = Kn 2 a 2 (|m| + A m ) , Vm e M. 

Besides, by taking a = (logn) -1 we see that condition (22) becomes automatically 
satisfied and by letting b tend to +oo, inequality (24) holds with pen given by (25) 
and R = n 2 o 2 Y>. 

The problem of selecting among histogram-type estimators in this regression set- 
ting has recently been investigated in Sauve (2008). Her selection procedure is 
similar to ours with a different choice of the penalty term. Unlike hers, our penalty 
does not involve any known upper bound on l/l^. 

3.2. Families of piecewise polynomials. In this section, we assume that / = 
(F(x\), . . . , F(x n )) where Xi = i/n for i = 1, . . . , n and F is an unknown function 
on (0, 1]. Our aim is to estimate F by a piecewise polynomial of degree not larger 
than d based on a data-driven choice of a partition of (0, 1]. 

In the sequel, we shall consider partitions m of {1, . . . ,n} such that each element 
I e in consists of at least d + 1 consecutive integers. For such a partition, S m 
denotes the linear span of vectors of the form (P(l/n), . . . , P(n/n)) where P varies 
among the space of piecewise polynomials with degree not larger than d based on 
the partition of (0, 1] given by 



f/minJ — 1 max/1 1 

U - ST" '— J' 7em f- 



Consider a partition m of {1, . . . , n} and M. a collection of partitions m such that 
S. m C S m . We obtain the following result. 

Proposition 2. Let a, b > 0. Assume that 

(26) |/| > (d+ l)a 2 log 2 n > d+ 1, V/em. 

If for some K > 1, 

, ^tsi{ 2 4v / 2(<r + c)(d+l)(6+2)\ \ \ \_j 

pen(m) > Kk 2 g 2 + c ^ ^ ^ '- (D m + A m ) Vm G M 

\ an I 

the estimator fm satisfies (24) with 

R= k 2 (a 2 + c ~" ~ y ~ ' ' ' ~' I S + 4- 



4v^(CT + c)(d+l)(6 + 2)\ (c + a) 2 (6+2) 2 



OK 

3.3. Families of trigonometric polynomials. We assume that / has the same 
form as in Subsection 3.2. Here, our aim is to estimate F by a trigonometric 
polynomial of degree not larger than some D > 0. 

Consider the (discrete) trigonometric system {<^j} - >0 of vectors in R" defined by 
0o = {l/s/n,...,l/s/n) 

<hj-i = \ - ( cos (2ttj'xi) , . . . ,cos(27rjxi)) , Vj > 1 
V n 

[2 

4>2j = W - (sin (27rja;i ),..., sin (27rja;i )), Vj > 1. 
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Let M be a family of subsets of {0, . . . , 2D]. For m e A4, we define S m as the 
linear span of the <f>j with j e m (with the convention 5 m = {0} when m = 0). 

Proposition 3. Lei a, 6 > 0. Assume that 2D + 1 < ^/n/(a log n). ///or some 
X > 1, 



pen(m) > Kk 2 g 2 + 



Ac(c + a)(b + 2) 



then satisfies (24) with 



R 



2 4c(c+(7)(6+2) 
O" H 



(An + A m ), VmeM 
4(& + 2) 2 (c+ ( x) 2 



a 2 (2D+ l)n 6 

4. Towards a more general result 

We consider the statistical framework presented in Section 3 and give a general 
result that allows to handle Propositions 1, 2 and 3 simultaneously. It will rely on 
some geometric properties of the linear spaces S m that we describe below. 

4.1. Some metric quantities. Let S be a linear subspace of M n . We associate to 
S the following quantities 



(27) 



A 2 (S) 



max |IIsei|2 and A 00 (S) 

— l,...,n 



max |II s ej|i. 

i— l,....n 



It is not difficult to see that these quantities can be interpreted in terms of norm 
connexions, more precisely 

A 2 (S) = sup and A co (5) = sup 

teS\{0} \H2 t£R"\{0} Moo 

Clearly A 2 (5) < 1. Besides, since \x\ 1 < ^fn\x\ 2 for alia; e E", A 00 (S') < v / nA 2 (5). 
Nevertheless, these bounds can be rather rough and turn out to be much smaller 
for the linear spaces S m presented in Subsections 3.1, 3.2 and 3.3 (for the exam- 
ples presented there, we refer to Subsections 5.6, 5.7 and 5.8 respectively for more 
accurate upper bounds on those quantities). 

4.2. The main result. Let {S m , m £ M} be family of linear spaces and {A m , m € M} 
a family of nonnegative weights. We define <S„ = ^2 meM S m and 

Aoo = ( sup A 0O (S' m + S m /) VI. 

\m,m' J 

Theorem 3. Let K > 1 and z > 0. Assume that for alii = l,...,n, inequality (20) 
holds. Let pen be some penalty function satisfying 

2cu s 

K 



pcn(m) > Kk 2 ( a 



(28) 
where 

(29) u= (c+ ( j)A co A 2 (5„)log(n 2 e z ). 

If one selects rh among M. as any minimizer o/crit(.) defined by (21) then 



E 



f - fn 



<C{K) 



inf 



E 



/ - /„ 



+ pen(m) I + R 
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where 



(30) C{K) 
and R= k 2 (a 2 + 2k" 1 cu) £ + 2u 2 A^e- z . 



K(K 2 + K - 1) 



When c — we derive the following corollary by letting z grow towards infinity. 

Corollary 1. Le£ X > 1. Assume that the & /or i = 1, . . . ,n satisfy inequality (20) 
with c = 0. // one selects rh among M as a minimizer o/crit defined by (21) with 
pen satisfying 

pen(m) > iYwrV (D m + A m ) , VmeM 



E 



/ - fn 



K(K 2 + K-l) „ , 

(K - l) 3 m£A4 



pen(m) I + R 



where R = K 3 (K - 1)- 2 k 2 ct 2 S. 



5. Proofs 



We start with the following result generalizing Theorem 2 when d and <5 are not 
induced by norms. We assume that T is finite and take numbers v and 6 such that 



(31) 



suprf(s,io) < v, sup c5(s, to) < o. 
seT seT 



{T} 



We consider now a family of finite partitions (Ak) k>0 of T, such that Aq 
and for fc > 1 and A e Ak 

d(s,t)<2~ k v and c5(s, i) < 2~ k b, \/s,t e A. 

Besides, we assume Ak C Ak-i for all fc > 1, which means that all elements A ^ Ak 
are subsets of an element of Ak-i- Finally, we define for fc > 

N k = \Ak+i\\A k \. 

Theorem 4. Let T be some finite set. Under Assumption 1, 

(32) 

where 

H 



(Z>H + 2V2v 2 x + 2bx^j < e~ x , Va; > 
2- fe (v^2\og{2^N k ) +blog(2 k+1 Nk)) . 



k>0 



Moreover, 
(33) 



(Z > H + 2s/2v 2 x + 26a;) < 2e~ x , Va; > 0. 



The quantity H can be related to the entropies of T with respect to the distances 
d and c5 (when c ^ 0) in the following way. We first recall that for a distance e(., .) 
on T and e > 0, the entropy H(T, e, e) is defined as logarithm of the minimum 
number of balls of radius e with respect to e which are necessary to cover T. For 
e > 0, let us set H(T,e) = ma,x{H(T,d,ev),H(T,cS 1 eb)}. Note that H(T,e) = 
for e > 1 because of (31). For e < 1, one can bound H(T,e) from above as follows. 
For fc > 0, each element A of the partition Ak+i is both a subset of a ball of 
radius 2~( k+1 ^v with respect to d and of a ball of radius 2~( k+1 *>b with respect cS. 
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Since \A k+ i\ < N k , we obtain for all e e [2- (k+1 \ 2- k ), H(T,e) < logN k and by 
integrating with respect to e and summing over k > 0, we get 

J {^j2v 2 H(T, e) + bH(T, s)j de < H. 

5.1. Proof of Theorem 4. Note that we obtain (33) by using (32) twice (once 
with X t and then with — X t ). Let us now prove (32). For each k > 1 and A £ A k , 
we choose some arbitrary element t k (A) in A. For each ( £ T and k > 1, there 
exists a unique such that t € A and we set 7Tfc(t) = tfc(A). When k = 0, we 
set 7r (t) = to- 

We consider the (finite) decomposition 

x t - x to = ~y]Xn k+1 ( t ) - x nh ( t ) 

k>0 

and set for k > 

z k = 2- k ^ v /2(log(2«=+iJV fc )+a:) + & (log(2' £ + 1 iV fc ) + *)) 

Since X)fc>o z * — z = ^ + 2v\/2x + 2bx, 

P(Z>z) < P(3i, 3fc>0, X Wfc+l(t) - X^ (t) > « fc ) 

< ^ ^ P(X u -X s >z fc ) 
fe>o (s,«)e£; fc 

where 

Sfc = {(7r fc (t),7r fc+ i(t))| ieT}. 

Since A k +i C ylfe, 7Tfe(t) and w k +i(t) belong to a same element of A k and there- 
fore d(s,u) < 2~ k v and cS(s,u) < 2~ k b for all pairs (s,u) G -Efc. Besides, under 
Assumption 1, the random variable X = X u — X s with (s,u) G _Efe is centered and 
satisfies (5) with 2~ k v and 2~ k b in place of v and c. Hence, by using Bernstein's 
inequality (3), we get for all (s, u) G E k and k > 

P(X U -X ;S > z fe ) < 2-( fe+1 )7V A r 1 e- ;l; < 2-( fe+1 )| J B fc |- 1 e- ;r . 

Finally, we obtain inequality (32) summing up this inequalities over (s, u) G E k and 
k > 0. 

5.2. Proof of Theorem 2. We only prove (12), the argument for proving (13) 
being the same as that for proving (33). For t G S and r > 0, we denote by 
B 2 {t, r) and Boo(t, r) the balls centered at t of radius r associated to || 1 1 2 and || ||oo 
respectively. In the sequel, we shall use the following result on the entropy of those 
balls. 

Proposition 4. Let \\ \\ be an arbitrary norm on S and B(0, 1) the corresponding 
unit ball. For each S G (0,1], the minimal number N{8) of balls of radius 5 (with 
respect to || \\) which are necessary to cover 5(0,1) satisfies 

Af(5) < (1 + 2S- 1 ) . 

This lemma can be found in Birge (1983) (Lemma 4.5, p. 209) but we provide a 
proof below to keep this paper as self-contained as possible. 
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Proof. With no loss of generality, we may assume that S = M. D . Let 5 £ (0,1]. A 
subset T of B(0, 1) is called <5-separated if for all s,t £ T, \\s — t\\ > 5. If T is 
^-separated, the family of (open) balls centered at those t £T with radius 8/2 are 
all disjoint and included in the ball B(0, 1 + 8/2). By a volume argument (with 
respect to the Lebesgue measure on M. D ), we deduce that T is finite and satisfies 
\T\ < (1 + 28~ 1 ) D . Consider now a maximal 5-separated set T, that is 

ITI = max IT' I 
ii T , i i 

where T runs among the family of all the <5-separated subset of B(0, 1). By defi- 
nition, for all t £ B(0, 1) \ T, T U {t} is no longer a <5-net and therefore that the 
family of balls {B(t,8), t £ T} covers B(0, 1). Consequently 

< \T\ < (1 + 28- 1 ) . 

□ 



Let us now turn to the proof of (12). Note that it is enough to prove that for some 
u < H + 2\j2v 2 x + 2bx and all finite sets T satisfying inequalities (11) and (31) 



sup(X t - X tQ ) > u < e x . 

t£T ) 

Indeed, for any sequence (T n ) n>0 of finite subsets of T increasing towards T, that 
is, satisfying T n C T n+ i for all n > and Un>o ^" = * ne se ts 

sup (X t -X to ) > u 

increases (for the inclusion) towards {Z > u}. Therefore, 

P (Z > u) = lim P ( sup (X t -X to )>u 



n — >+oo 



teT n 



Consequently, we shall assume hereafter that T is finite. 

For k > and j £ {2, oo} define the sets Aj.k as follows. We first consider the case 
j = 2. For k = 0, A2.0 = {T}. By applying Proposition 4 with || || = || \\ 2 /v and 
8 = 1/4, we can cover T C B 2 (to,v) with at most 9 D balls with radius v/4. From 
such a finite covering {B\, . . . , Bjy} with N < 9 D , it is easy to derive a partition 
-4-2,1 °f T by at most 9 D sets of diameter not larger than v/2. Indeed, A2.1 can 
merely consist of the non-empty sets among the family 

B k \ IJ BA nT, k = l,...,N 

l<i<k J 

(with the convention [J = 0). Then, for k > 2, proceed by induction using 
Proposition 4 repeatedly. Each element A £ A 2 .k-\ is a subset of a ball of radius 
2~ k v and can be partitioned similarly as before into 5 D subsets of balls of radii 
2-(fc+i) w By doing so, the partitions A 2 .k with k > 1 satisfy A 2 ,k C A 2 ,k-i, 
\A 2 , k \ < (1-8) D x 5 kD and for all A £ A 2 , k ,' 

sup \\s - t\\ 2 < 2~ k v. 
s,teA 
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Let us now turn to the case j = +00. If c > 0, define the partitions *4oo,fc in exactly 
the same way as we did for the A 2 .k- Similarly, the partitions -4oo,fe with k > 1 
satisfy Ax>,fc C Acc,k-i, \Aco,k\ < (1-8)- x 5 kD and for all A e Aoo,k, 

sup c\\a - t||oo < 2~ k b. 
s,teA 

When c = 0, we simply take Aoo,k = {T} for all k > and note that the properties 
above are fulfilled as well. 

Finally, define the partition Ak for fe > as that generated by .42, fc and *4oo,fc 5 
that is 

A k = {A 2 nA oc \A 2 e A 2 ,k, A^ e Ax^} • 
Clearly, .4fc+i C .4fc. Besides, |.4o| = 1 and for k > 1, 

|4|<|^IM=cil<(l-8) 2D x5 M . 

The set T being finite, we can apply Theorem 4. Actually, our construction of 
the Ak allows us to slightly gain in the constants. Going back to the proof of 
Theorem 4, we note that 

\E k \ = I {(7r fc (t),7r fc+ i(t)) I t e T} I < |A + i| < 9 2D x 5 2fcI? 

since the element 7Tfc + i(t) determines 7r fc (i) in a unique way. This means that one 
can take Nk = 9 2D x 5 2kD in the proof of Theorem 4. By taking the notations of 
Theorem 4, we have, 

H < J2 2 ~ k v^log^ 1 x 9 2D x 5 2fc£, ) + 61og(2 fc+1 x 9 2D x 5 2fcjD ) 

< 14\/Z>^2 + 18L>6 
and using the concavity of x ^ \fx, we get 

H + 2V2v 2 x + 2bx < \<±^J~Drf + 2\/2w 2 x + 18b(D + x) 
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(^/v 2 (D + x) + b{D + x)^j . 



which leads to the result. 



5.3. Control of x 2 -type random variables. We have the following result. 

Theorem 5. Let S be some linear subspace o/R™ with dimension D. If the coor- 
dinates of £ are independent and satisfy (20), for all x, u > 0, 



II [a 2 - 2 ''' 1 ) (D + x), |n s £|oo<« 



K ) 



(34) 

with k = 18 and 

(35) P(|n s ^| co > a; )<2nexp 
where A 2 (S) is defined by (27). 



< e 



2A 2 (S*) (<j 2 + cx) 



Proof. Let us set x = PsCb- For t e S, let X t = (£,t) and t = 0. It follows from 
the independence of the and inequality (20) that (11) holds with d(t, s) = a\t — s\ 2 
and S(t,s) — \t — s^, for all s,t e S. The random variable % equals the suprcmum 
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of the X t when t runs among the unit ball of S. Besides, the supremum is achieved 
for t = Hs£/x an d thus, on the event {x > z, \fIs€\oo < u } 

X = supX t with T = {te S, \t\ 2 < 1, |i|oo < "z" 1 } 
teT 

leading to the bound 

P(X > z, |n s e|co < u) < P fsupX t > z 

VteT 

We take z = k^/(ct 2 + 2cuk~ 1 )(D + x) and (using the concavity of x i— > note 
that 

z > k (y/a 2 (D + x) + cmz~ 1 (D + x)j . 

Then, by applying Theorem 2 with v = a, b = cu/z, we obtain (34). 

Let us now turn to (35). Under (20), we can apply Bernstein's inequality (3) to 
X = (£, t) and X = ( — £,t) with t € 5, w 2 = cr 2 |i|| and c|t|oo in place of c and get 
for all i e S and x > 

(36) P(|^t)|>x)<2exp[- 2(ff2|f| ^ c|f|ooa;) 

Let us take t — Hs^i with t e {1, . . . , n}. Since |t| 2 < A 2 (S') and 

|*|oo = max \{U s ei,ei>)\= max \(U s ei,U s ei')\ < Al(S), 

l,l — l,...,7l i,i — l,...,n 

for all z G {!,..., n} 



'(|(n^, ei )| >z) < 2exp 



2A 2 (S') ((T 2 + ca;) 

We obtain (35) by summing up these probabilities for i = 1, . . . , n. 



□ 



5.4. Proof of Theorem 3. Let us fix some m € AL It follows from simple algebra 
and the inequality crit(m) < crit(m) that 



/ - h 



< 



f - frr 



+ 2(£, frh - f m ) + pen(m) - pen(m). 



Using the elementary inequality 2ab < a 2 + b 2 for all a, b E R, we have for K > 1, 



frh fm) — 2 fm f n 

< K- 1 



|n Sm+ s^l 2 



i 



K - 1 
K 



frh f 



K 



K- 1 



/ - fn 



and we derive 
(K-l) 2 



K 2 



f fm 



< 



K 2 + K - 1 



< 



K(K-l) 
K 2 + K-l 
K(K - 1) 



K |n Sm+ 5 A Cl 2 - (pen(m) - pen( 
pen(m) 



l n s m +s A ^| 2 - (pen(m) + pen(m)) . 
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Setting 
Ai(ra) 



Kk 2 a 2 + 



2cu 



\n Sm+s ^\; 

« 2 (- 2 + 2 -f) 



A m j i{|n Sm+SA cL<«} 



A 2 (m) = inn^+s^i^lin^+s^u >u} 

and using (28), we deduce that 



(K-l) 2 
K 2 



f-fn 



< 



K 2 + K-l 
K(K-l) 



f - frr 



■ pen(m) + A\(rh) + A 2 (m), 



and by taking the expectation on both side we get 



K 2 



E 



f - frr 



K 1 + K - 1 

< T~ 7-E 

- K(K-l) 



f - frr 



+pen(m)+E [Ai(m)]+E [A 2 {m)\ . 



The index m being arbitrary, it remains to bound E x = E[ J 4 1 (m)] and E 2 = 
E [A 2 (m)] from above. 

Let m! be some deterministic index in M. . By using Theorem 5 with S = S m + S m i 
the dimension of which is not larger than D m + D m i and integrating (34) with 
respect to x we get 

2cu N 



E [A{m')\ < Kk 2 a 



-A„ 



and thus 



E x < Y, E[A(m')] <Kk 2 [a 2 + 2€ ^\ll. 



Let us now turn to EL4 2 (m)]. By using that 5 A + S m C <S„, \Rs^+s m t\l < 
|n5„^2 < n I C| - Besides, it follows from the definition of Aoo that 

in^+s^L = |n Sr5l+ s m n 5 „ei co < |n5„eL • 

and therefore, setting xq = A m u 



E 2 < KnE 



We shall now use the following lemma the proof of which is deferred to the end of 
the section. 

Lemma 1. Let X be some nonnegative random variable satisfying for all x > 0, 

x 2 

(37) P(X >*)< a exp with cf>{x) = g - — 

where a, a > and (3 > 0. For x > swc/i i/iai 4>{xq) > 1, 

E [Xfl {X > x }] < a^e-^ 1 "' ^1 + , Vp > 1. 

We apply the lemma with p = 2 and X = (ILs^l^ for which we know from (35) 
that (37) holds with a = 2n, a — A 2 (S)a 2 and (3 = K\(S)c. Besides, it follows from 
the definition of xo and the fact that n > 2 that 
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The assumptions of Lemma 1 being checked, we deduce that E 2 < 2Kx\e 2 and 
conclude the proof putting these upper bounds on E\ and E 2 together. 

Let us now turn to the proof of the lemma. 



Proof of Lemma 1. Since 

E [X p l {X > x }} < zgP (X >x ) + f 

J X 



+oo 



px 



(X > x) dx, 



it remains to bound from above the integral. Let us set 

f+oo 

I p = px^e'^dx. 

J x a 

Note that <j)' is increasing and by integrating by parts we have 



+oo px p-l 



Jt>- 1 P -4>(x ) 



+ (p-l)/ p _i 



By induction over p and using that Xo4>'(x ) > 4>(x ) > 1 we get 

i < vWe-+™ v MM: 

i p S P-x e 2^ rn-fc-11! 



< 



eplx p ] e~< t >( Xo '> 



k=0 



<j>(x ) 



□ 



5.5. An intermediate result. The following proposition allows to bound A 2 (S) 
and A OQ (S) under suitable assumptions on an orthonormal basis of S. 

Proposition 5. Let P be some partition of {1, . . . ,n}, J some nonempty index set 
and 

(j,J)e JxP} 

an orthonormal system such that for some $ > and all I e P 



sup \<f>j, < -= 



and 



= Vig J. 



If S is the linear span of the <f)jj with (j, I) £ J x P, 



A-l(S) < ( } J ^\ Ti ) A 1 and A^S) < (|J|4> 2 ) A (v / nA 2 (5)) 
\ mm rep i / 



\min/ e p 

Proof of Proposition 5. We have already seen that A 2 (S) < 1 and A OCl (S) < V / ^A 2 (S'), 
so it remains to show that 



Al(S) < 



|J|$ 2 



and A oc (5') < |J|$ 2 



min /e p |/| 

Let i = l,...,n. There exists some unique I E P such that i g / and since 
(4>j,r,e l ) = for all /' ^ I, U s e l = J2jeJ ( e *> 'Pjjj&jJ- Consequently 

|J|$ 2 |J|$ 2 



in^i 2 ^^^,^) 2 ^^— < 



min /e p |/| 



and 



i'ei 
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At(S n )<A 2 2 (S m )< , 



We conclude since i is arbitrary. □ 

5.6. Proof of Proposition 1. Let m be some partition of {1, ... , n}. By applying 
Proposition 5 with J = {1}, P = m and <f> = 1, we obtain 

A^(5 m ) < , 1 and Aoo^) < 1. 

min 7em |7| 

In fact, one can check that these inequalities arc equalities. Since for all m E Ai, 
S m C S m , we deduce that under (22) 

1 

a 2 log 2 n 

For two partitions m, m! of {1, . . . , n}, define 

(38) mVm' = {lnl'\ I em, I' em'}. 

Since the elements of m, m' for m,m' E Ai consist of consecutive integers S m y m > = 
S m + S m > and therefore 

Aoo = sup A 00 (S m + S m ,)= sup A 00 (5 mVm ') = 1. 
The result follows by applying Theorem 3 with z = b log n. 

5.7. Proof of Proposition 2. Let m be a partition of {l,...,n} such that for 
all I e m, I consists of consecutive integers and |/| > d. As proved in Mason & 
Handscom (2003), an orthonormal basis of S m is given by the vectors </>j / defined 

by 

{<f>o,i,ei} = —f=li(i) 

and for j = 1 , . . . , d 

I 2 / f(i-minJ+l/2)7T > , 
■,J.ei> = \lTfiQi ( cos ( [yi ) ) 



where Q^- is the Chebyshev polynomial of degree j defined on [—1, 1] by the formula 

Qj(x) = cos(j8) if x = cos 8. 
By applying Proposition 5 with <f> = \/2, P = m and J = {0, . . . , d} and get 

A 2 2 (S m ) < 2{d+1 l, and A co (5 m ) < 2(d+ 1). 

Since for those m E M, S m C S m , S n = J2 m eM ^ m c ^ m ano - therefore 

1 



A^(5 n ) < A 2 2 (S m ) < 



a 2 log 2 n 

Moreover, since for the elements of m and m' for m,m' E Ai consist of consecutive 
integers S m + S m i = 5 mV m' with mVm'is defined by (38) and 

sup A 00 (S' m + 5 m /) = sup A 00 (5 mV m') < 2(d+ 1) 
which implies that < 2(d-\- 1). It remains to apply Theorem 3 with z = 61ogn. 
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5.8. Proof of Proposition 3. Let m = {0, . . . ,21?}. Under the assumption that 
2D+1 < y/n/(alogn), for all to C m, the family of vectors {4>j}j ern is a orthonormal 
basis of S m . By applying Proposition 5 with P reduced to {{1, . . . , n}}, J = m, 
<f> = y/2, we get 

Al(S m ) < ^ and A^SJ < V^A 2 (S m ) < ^2\m\. 
Since for all m e M, S m C S m , S n = J2meM ^™ c an( ^ therefore 

Al(5 n )<Al(S m )< 2(2 ^ +1) . 
Moreover, for all to, to' e M., S m + S m > = S mU m' with to U to' C m and thus, 

Aoo(S m + S m >) < y/2(\mUm'\ < ^2(2D+1). 

It remains to apply Theorem 3 with z = blogn. 

Acknowledgment: We thank Jonas Kahn for pointing out this counter-example 
in Subsection 2.2 and to Lucien Birge for his useful comments and for making us 
aware of the book of Talagrand which has been the starting point of this paper. 
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